The tl;dr version: in a given box of
Lucky Charms, for each bowl eaten we estimate a decrease of
approximately 2.7 total charms per bowl on average. The weight of cereal
also appears to be significant and for an increase of 1g of cereal we
estimate an increase of approximately 0.5 charms on average—for bowl
held constant. The interaction between bowl and weight is not
statistically significant.
See this GitHub repository for all of the data, code, photos, etc.
Some years ago there was a story about somebody investigating the question of whether or not “Double Stuf” Oreos were truly double-stuffed. (They’re not.) It was a neat experiment, and there is quite a bit of material on the Internet about it nowadays, check here to get started. It evidently caused enough splash that some teachers were said to be using it for an introductory statistics activity in the public schools, and indeed students have told me they repeated the experiment at their own school.
Against this backdrop and in the summer of 2023 I was eating a bowl of Lucky Charms one morning. The box was near empty, and I thought to myself, “I’d almost rather throw the rest of this box away and open up a brand new box of Lucky Charms!” Now, if you’re anything like me, and millions of other people, then you love Lucky Charms, and you’ve loved them for as long as you can remember. They truly live up to their tagline: Magically Delicious! However, it occurred to me on that summer morning that the latter bowls of the box don’t seem to be quite so magical as the earlier bowls are. They’ve lost something. (The charms, of course.) But was it my imagination? Is this effect real? And if so, is it detectable?
I happened to be assigned to a summer upper-division undergraduate Probability & Statistics class at the time and the 4 students and I were determined to find out.
After some discussion, we decided on our materials and methods.
The Lucky Charms were purchased from our local retailer—Wal-Mart. There wasn’t anything special about the number of boxes, it was simply the number I could carry with two hands to the sixth floor of Cafaro Hall in one trip. The kitchen scale was for measuring the weight of cereal, which the team thought might be important, and the scale would also help with data collection because we didn’t want to be overly concerned with sampling the exact same quantity of cereal every time.
For the purposes of this experiment, a “bowl” was taken to be approximately 1 serving as recommended on the box (1 cup or 36g), even though nobody but a tiny magical leprechaun can get by on only 36g of Lucky Charms for breakfast. The team was not overly careful with the bowl size, and anything close to 1 cup was considered good enough. We were accounting for mass of cereal with the kitchen scale anyway, and were shooting for a healthy range of observed weights.
A bowl was scooped directly out of the box, weighed, and then poured out on a table surface for counting. The toasted oats were separated from the marshmallows and discarded. Then the following eight (8) charm types were recognized and their number recorded: Rainbows, Pink Hearts, Purple Horseshoes, Blue Moons, Green Clovers, Unicorns, Tasty Red Balloons and Orange Stars.
Typically There were little bits and pieces of charms mixed about in the bowl; not every charm is 100% whole. To deal with this, the team attempted to classify the bit in the type of charm (Green Clover, Blue Moon, etc.), and if the type could be determined, then that bit was counted as 1 in the respective category. If the bit was too small/nondescript for type identification then it was discarded.
Data were collected over two separate class meetings—we had other statistical ground to cover, after all. Two pairs of students collected and counted at once, and I helped with the scale and recording a hard copy of the weight values as they were called out and entered into the computer.
The plastic bowls + cereal were weighed together each round, and then the weight of the bowl (measured at the start of the experiment) was subtracted from the observed total weight. The charms were entered into their respective columns and totaled.
Box: the box number (1 through 6)Bowl: the sequential bowl for each box (ranges from 1
to around 12)Observation: the observed order of bowls across boxes
(1 to 69)Totweight: weight of each bowl + cereal, in gramsWeight: of cereal, in grams, after subtracting the
weight of the respective plastic bowlHearts, Stars, etc: how many of that charm
in that bowlTotcharms: sum total of the assorted charmsHere is a look at the top of the data set (first 6 rows):
head(Lucky)
## # A tibble: 6 × 14
## Box Bowl Observation Totweight Weight Hearts Starts Horseshoes Clovers
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 1 60.5 32.1 5 5 14 3
## 2 1 2 2 69.9 41.5 7 1 7 8
## 3 1 3 3 66.2 37.8 5 4 4 4
## 4 1 4 4 63.8 35.4 5 6 8 1
## 5 1 5 5 75.5 47.2 4 1 5 4
## 6 1 6 6 77.3 48.9 6 4 4 4
## # ℹ 5 more variables: Moons <dbl>, Unicorns <dbl>, Balloons <dbl>,
## # Rainbows <dbl>, Totcharms <dbl>
The average Weight was approximately 46.3g, the naximum
number of a particular charm was 15 (Pink Hearts and Purple Horseshoes)
in any one bowl, and there are many, many other statistics and tables we
can construct for this data set. For the task at hand, though, at the
moment we are primarily concerned with Totcharms, and how
that response variable is related to Bowl, and maybe
Weight to a lesser extent.
Here is a graph of Totcharms by Bowl,
colored by Box.
Lucky |> ggplot(aes(x = Bowl, y = Totcharms, color = Box)) +
geom_point(size = 3) +
labs(y = '# Charms') -> p1
p1
Here we see a clear decreasing trend in Totcharms as
Bowl increases, and the pattern is surprisingly linear.
There may be a slight curvature. The colors are difficult to pick out,
so let’s make a line plot and highlight a couple of series.
sizes <- c(2.5, 1, 2.5, 1, 1, 1)
alphas <- c(1, 0.2, 1, 0.2, 0.2, 0.2)
Lucky |> ggplot(aes(x = Bowl, y = Totcharms)) +
geom_line(aes(colour = Box, linewidth = Box, alpha = Box)) +
scale_discrete_manual("linewidth", values = sizes) +
scale_alpha_manual(values = alphas, guide = "none")
We can see a general trend downward for each series, but the path to get there varies for every box. Notice how Box 3 starts high and stays high for a few bowls before dropping off smoothly until after Bowl 10, when it crashes. Low how Box 1 starts relatively low, but then increases after Bowl 5, peaks at Bowl 8, then nosedives down to Bowl 12. It’s as if the charms were settled near the top of Box 3, but were more concentrated in more of a pocket near the middle of Box 1. Some boxes bounce around a great deal, while some boxes are more of a straight line downward. Put it all together, though, and the overall trend is decreasing, and linear. Note that all boxes lasted to Bowl 11, but only 2 boxes had 12 bowls and only a single box (Box 4) made it to Bowl 13.
Now let’s take a look at Totcharms versus
Weight.
Lucky |> ggplot(aes(x = Weight, y = Totcharms, color = Box)) +
geom_point(size = 3) +
labs(x = 'Weight (g)', y = '# Charms') -> p2
p2
This is a much noisier plot—as expected. We have a nice range of
weights, from a minimum less than 30g up to a maximum near 70g. Note
that the lowest weights are usually associated with the last bowl of the
box: there isn’t enough remaining to make a full 8oz bowl. And notice
the one bowl which was extremely heavy. It isn’t obvious what the
explanation is, but if we dig a little deeper and plot
weight versus bowl we can get a hint:
Lucky |> ggplot(aes(x = Bowl, y = Weight, color = Box)) +
geom_point(size = 3) + ylim(5, 75) +
labs(y = 'Weight (g)') -> p3
p3
We see that the extra-heavy bowl was the last Bowl = 12
of Box = 1. I don’t remember that particular data point,
but since it was the first time the team had ever finished a box, near
the end it would have been difficult to judge how much was left and we
just poured the remainder of the box into that last bowl—I do this all
the time at breakfast when I am nearing the end of a box of cereal. Just
looking at the points on the graph, if that 12th bowl of 70g would have
been split into (maybe) 40g and 30g, then we would have had two boxes
that lasted to 13 bowls instead of one, and maybe the models below would
have fit better. Alas! Such is the scientific enterprise.
While the relationship between Totcharms and
Weight does not appear to be strongly linear, there is a
hidden relationship between Totcharms, Bowl,
and Weight, which is best visualized with a 3D plot.
library(plotly)
Lucky$Box <- as.factor(Lucky$Box)
fig <- plot_ly(Lucky, x = ~Bowl, y = ~Weight, z = ~Totcharms, color = ~Box)
fig <- fig %>% add_markers()
fig <- fig %>% layout(scene = list(xaxis = list(title = 'Bowl'),
yaxis = list(title = 'Weight (g)'),
zaxis = list(title = '# Charms')),
legend=list(title=list(text='Box')))
fig
Note that the 3D plot is interactive. Go ahead, spin it around, zoom, pan—check it out. If you spin it around just right you will see that the dots lie more-or-less on a flat plane in 3D-space. This is exactly the kind of relationship we are looking for in a multiple linear regression model (which we’ll get to in a minute.)
Now let’s try to quantify the linear relationship between these
variables. We will start with a simple linear regression model relating
Totcharms to Bowl.
Here is the model:
mod1 <- lm(Totcharms ~ Bowl, data = Lucky)
summary(mod1)
##
## Call:
## lm(formula = Totcharms ~ Bowl, data = Lucky)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.7629 -5.7629 -0.4327 6.2277 22.2277
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 55.1309 2.1237 25.960 < 2e-16 ***
## Bowl -2.6698 0.2985 -8.945 4.81e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.313 on 67 degrees of freedom
## Multiple R-squared: 0.5442, Adjusted R-squared: 0.5374
## F-statistic: 80.01 on 1 and 67 DF, p-value: 4.807e-13
We see that Bowl is a highly significant predictor for
Totcharms. The slope is approximately \(-2.7\), in other words, for each additional
bowl of Lucky Charms eaten we estimate the average
Totcharms to decrease by 2.7 charms. Our coefficient of
determination is \(R^2 = 0.5442\), that
is, approximately 54% of the variance in Totcharms is
explained by the regression model with Bowl as a predictor.
There is a whole discussion here about residual analysis that we are
going to skip, but suffice it to say that the residual plots are
relatively well-behaved. Let’s check out a fitted line plot with
confidence bands for the regression line (the default):
p1 + geom_smooth(method = "lm", aes(group=1), colour="black")
That’s a nice relationship with a clear decreasing trend.
We will do the same thing for Weight, ignoring
Bowl for the moment. Here we go:
mod2 <- lm(Totcharms ~ Weight, data = Lucky)
summary(mod2)
##
## Call:
## lm(formula = Totcharms ~ Weight, data = Lucky)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.0151 -8.7745 0.6901 7.8328 24.4701
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.1370 10.5650 2.095 0.0399 *
## Weight 0.3502 0.2256 1.552 0.1254
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.1 on 67 degrees of freedom
## Multiple R-squared: 0.0347, Adjusted R-squared: 0.02029
## F-statistic: 2.409 on 1 and 67 DF, p-value: 0.1254
Here too, Weight is a highly significant predictor for
Totcharms. The slope is approximately \(0.35\), in other words, for each additional
1g of Lucky Charms in a bowl we estimate the average
Totcharms to increase by 0.35 charms. This makes sense:
more cereal, more charms. The coefficient of determination is pretty
bad: \(R^2 = 0.0347\), that is,
approximately NONE% of the variance in Totcharms is
explained by the regression model with Weight as a
predictor. (!!) That’s okay, Weight was more of a
supplementary measure to help control for variability in the scooped
cereal amounts. The residual analysis isn’t as pretty here, which makes
sense, and we would expect some problems anyway with the extreme values
on the high and low ends. For the sake of completeness we will include
another fitted line plot:
p2 + geom_smooth(method = "lm", aes(group=1), colour="black")
I wanted to use the ggpubr package to put these plots
together and try to save some space, but the plots looked cramped and
weren’t very informative. Anyway, this is what I would have done.
# library(ggpubr)
# ggarrange(p1 + geom_smooth(method = "lm", aes(group=1), colour="black"),
# p2 + geom_smooth(method = "lm", aes(group=1), colour="black"),
# align = 'h', labels=c('A', 'B'), legend = "right",
# common.legend = TRUE)
Now we get to the fun part: we’ve explored the relationship between
Totcharms and Bowl and
Totcharms ~ Weight individually, but what if we put them
together? Let’s find out:
mod3 <- lm(Totcharms ~ Bowl + Weight, data = Lucky)
summary(mod3)
##
## Call:
## lm(formula = Totcharms ~ Bowl + Weight, data = Lucky)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.8825 -5.4425 -0.9975 5.2475 26.5304
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.3168 6.8655 4.853 7.78e-06 ***
## Bowl -2.7552 0.2796 -9.855 1.35e-14 ***
## Weight 0.4819 0.1452 3.318 0.00148 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.754 on 66 degrees of freedom
## Multiple R-squared: 0.6094, Adjusted R-squared: 0.5976
## F-statistic: 51.49 on 2 and 66 DF, p-value: 3.363e-14
Check it out! Both Bowl and Weight are
still highly significant predictors of Totcharms.
The slopes are nearly identical, but the slope for Weight
increased to nearly 0.5 charms for each additional 1g of cereal. The
slope for Bowl is still \(-2.7\). Our (adjusted) Multiple \(R^2\) has increased to almost 60%—this is
notable considering the small dataset, the general noise and design
choices for the experiment (every bit counts as 1, etc.). It’s kind of a
surprise that the data weren’t more mischievous and belligerent. I wish
more datasets I work with on a daily basis were so good-natured.
We’ve got to do this one. It is a really cool plot to see and play with, but the code is more involved than the other examples above. Never fear, you can check it all out in this GitHub Gist here. Let’s get on with the plot: